kdd2014 hamalainen webb discovery

Upload: investtcartier

Post on 02-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    1/116

    Tutorial KDD14 New York

    STATISTICALLY SOUNDPATTERN DISCOVERY

    Wilhelmiina HmlinenUniversity of EasternFinland

    [email protected].

    Geoff WebbMonash UniversityAustralia

    [email protected]://www.cs.joensuu.fi/pages/whamalai/kdd14/sspdtutorial.html

    SSPD tutorial KDD14 p. 1

    http://www.cs.joensuu.fi/pages/whamalai/kdd14/http://localhost/var/www/apps/conversion/tmp/scratch_4/sspdtutorial.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_4/sspdtutorial.htmlhttp://www.cs.joensuu.fi/pages/whamalai/kdd14/
  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    2/116

    Statistically sound pattern discovery: Problem

    000 000 000 000 000 000 000 000 000 111 111 111 111 111 111 111 111 111POPULATIONclean and accurateusually infinite SAMPLEmay contain noiseREAL PATTERNS

    (with some tool)

    PATTERNS FOUND

    FROM THE SAMPLE

    ?

    REAL WORLDIDEAL WORLD

    SSPD tutorial KDD14 p. 2

    000000111111

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    3/116

    Statistically sound pattern discovery: Problem

    000 000 000 000 000 000 000 000 000 111 111 111 111 111 111 111 111 111 000 000 000 000 000 000 000 000 000 111 111 111 111 111 111 111 111 111

    POPULATION

    clean and accurate

    usually infinite SAMPLE

    may contain noise

    REAL PATTERNS

    REAL WORLDIDEAL WORLD

    ?

    (with some tool)

    PATTERNS FOUND

    FROM THE SAMPLE

    SSPD tutorial KDD14 p. 3

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    4/116

    Statistically Sound vs. Unsound DM?

    Pattern-type-rst :Given a desired classicalpattern, invent a searchmethod.

    Method-rst :Invent a new patterntype which has an easysearch method

    e.g., an antimonotonicinterestingness propertyTricks to sell it:

    overload statisticaltermsdont specify exactly

    SSPD tutorial KDD14 p. 4

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    5/116

    Statistically Sound vs. Unsound DM?

    Pattern-type-rst :Given a desired classicalpattern, invent a searchmethod.

    + easy to interpretecorrectly

    + informative+ likely to hold in future

    computationally de-manding

    Method-rst :Invent a new patterntype which has an easysearch method

    difcult to interprete misleading

    information no guarantees on

    validity+ computationally easy

    SSPD tutorial KDD14 p. 5

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    6/116

    Statistically sound pattern discovery: Scope

    Other patterns?Statistical dependencypatterns

    Dependency rules Correlated itemsets

    PATTERNS MODELS

    timeseries?

    Discussion

    Part I(Wilhelmiina)

    Part II(Geoff)

    graphs?

    loglinear models classifiers

    SSPD tutorial KDD14 p. 6

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    7/116

    Contents

    Overview (statistical dependency patterns)Part I

    Dependency rulesStatistical signicance testing

    Coffee break (10:00-10:30)Signicance of improvement

    Part IICorrelated itemsets (self-sufcient itemsets)Signicance tests for genuine set dependencies

    DiscussionSSPD tutorial KDD14 p. 7

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    8/116

    Statistical dependence: Many interpretations!

    Events ( X = x) and (Y = y) are statistically indepen-dent , if P( X = x, Y = y) = P( X = x)P (Y = y).

    When variables (or variable-value combinations) arestatistically dependent ?When the dependency is genuine? measures for the strength and signicance ofdependenceHow to dene mutual dependence between three ormore variables?

    SSPD tutorial KDD14 p. 8

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    9/116

    Statistical dependence: 3 main interpretations

    Let A, B, C binary variables. Notate A ( A = 0) and A ( A = 1)1. Dependency rule AB C : must be = P( ABC ) P ( AB)P (C ) > 0 (positive dependence).2. Full probability model :1 = P( ABC ) P ( AB)P (C ),2 = P( A BC ) P ( A B)P (C ),3 = P(

    ABC )

    P (

    AB)P (C ) and

    4 = P( A BC ) P ( A B)P (C ).If 1 = 2 = 3 = 4 = 0, no dependenceOtherwise decide from i (i = 1 , . . . , 4) (with someequation)

    SSPD tutorial KDD14 p. 9

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    10/116

    Statistical dependence: 3 interpretations

    3. Correlated set ABC Starting point mutual independence:P ( A = a, B = b, C = c) = P( A = a)P ( B = b)P (C = c) forall a, b, c {0, 1}different variations (and names)! e.g.

    (i) P( ABC ) > P( A)P ( B)P (C ) (positive dependence) or(ii) P( A = a, B = b, C = c) P( A = a)P ( B = b)P (C = c)for some a, b, c {0, 1}

    + extra criteriaIn addition, conditional independence sometimes usefulP ( B = b, C = c

    | A = a) = P( B = b

    | A = a)P (C = c

    | A = a)

    SSPD tutorial KDD14 p. 10

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    11/116

    Statistical dependence: no single correct denition

    One of the most important problems in the philosophy of natural sciences is in addition to the well-known one regarding the essence of the concept of probability itself to make precise the premises which would make it possible to regard any given real events as independent .

    A.N. Kolmogorov

    SSPD tutorial KDD14 p. 11

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    12/116

    Part I Contents

    1. Statistical dependency rules2. Variable- and value-based interpretations3. Statistical signicance testing

    3.1 Approaches

    3.2 Sampling models3.3 Multiple testing problem4. Redundancy and signicance of improvement

    5. Search strategies

    SSPD tutorial KDD14 p. 12

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    13/116

    1. Statistical dependency rules

    Requirements for a genuine statistical dependency rule X A:

    (i) Statistical dependence(ii) Statistically signicant

    likely not due to chance(iii) Non-redundantnot a side-product of another dependencyadded value

    Why?SSPD tutorial KDD14 p. 13

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    14/116

    Example: Dependency rules on atherosclerosis

    1. Statistical dependencies:smoking atherosclerosissports atherosclerosisABCA1-R219K atherosclerosis ?

    2. Statistical signicance?

    spruce sprout extract atherosclerosis ?dark chocolate atherosclerosis3. Redundancy?

    stress, smoking atherosclerosissmoking, coffee atherosclerosis ?high cholesterol, sports atherosclerosis ?male, male pattern baldness

    atherosclerosis ?

    SSPD tutorial KDD14 p. 14

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    15/116

    Part I Contents

    1. Statistical dependency rules2. Variable- and value-based interpretations3. Statistical signicance testing

    3.1 Approaches

    3.2 Sampling models3.3 Multiple testing problem4. Redundancy and signicance of improvement

    5. Search strategies

    SSPD tutorial KDD14 p. 15

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    16/116

    2. Variable-based vs. Value-based interpretation

    Meaning of dependency rule X A

    1. Variable-based: dependency between binary variables X and APositive dependency X A the same as X AEqually strong as negative dependency between X and A (or X and A)

    2. Value-based: positive dependency between values

    X =

    1 and A =

    1different from X A which may be weak!

    SSPD tutorial KDD14 p. 16

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    17/116

    Strength of statistical dependence

    The most common measures:

    1. Variable-based: leverage( X , A) = P( XA) P ( X )P ( A)

    2. Value-based: lift ( X , A) =

    P ( XA)P ( X )P ( A)

    =P ( A| X )P ( A)

    =P ( X | A)P ( X )

    P ( A| X ) = condence of the ruleRemember: X

    ( X = 1) and A

    ( A = 1)

    SSPD tutorial KDD14 p. 17

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    18/116

    Contingency table

    A A All X f r ( XA) = f r ( X

    A) =

    n[P ( X )P ( A) + ] n[P ( X )P ( A) ] f r ( X ) X f r ( XA) = f r ( X A) =

    n[P (

    X )P ( A)

    ] n[P

    ( X )P (

    A) + ] f r (

    X )

    All f r ( A) f r ( A) nAll value combinations have the same

    |

    |!

    depends on the value combination

    f r ( X )=absolute frequency of X P ( X )=relative frequency of X

    SSPD tutorial KDD14 p. 18

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    19/116

    Example: The Apple problem

    Variables: Taste, smell, colour, size, weight, variety, grower,. . .

    (55 sweet + 45 bitter)100 apples

    SSPD tutorial KDD14 p. 19

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    20/116

    Rule RED SWEET ( Y A ) P ( A|Y ) = 0 .92 , P( A|Y ) = 1 .0 A=sweet, A=bitter = 0 .22 , = 1 .67 Y =red, Y =green

    60 red apples(55 sweet)

    Basket 1 Basket 2

    40 green apples(all bitter) SSPD tutorial KDD14 p. 20

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    21/116

    Rule RED and BIG SWEET ( X A ) P ( A| X ) = 1 .0, P( A| X ) = 0 .75 X =(red big) = 0 .18 , = 1 .82 X =(green small)

    Basket 1

    40 large red apples(all sweet)

    40 green + 20 small red apples

    Basket 2

    (45 bitter)SSPD tutorial KDD14 p. 21

    When the value-based interpretation could be

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    22/116

    When the value based interpretation could be useful? Example

    D=disease, X =allele combinationP ( X ) small and P( D| X ) = 1 .0

    ( X , D) = P( D)1 can be large

    P ( D

    | X )

    P( D)

    P ( D| X ) P( D)

    ( X , D) = P( X )P ( D) small .

    D

    X

    Now dependency strong in the value-based but weak in thevariable-based interpretation!

    (Usually, variable-based dependencies tend to be morereliable)SSPD tutorial KDD14 p. 22

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    23/116

    Part I Contents

    1. Statistical dependency rules2. Variable- and value-based interpretations3. Statistical signicance testing

    3.1 Approaches3.2 Sampling models3.3 Multiple testing problem

    4. Redundancy and signicance of improvement

    5. Search strategies

    SSPD tutorial KDD14 p. 23

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    24/116

    3. Statistical signicance of X A

    What is the probability of the observed or a strongerdependency, if X and A were independent? If smallprobability, then X A likely genuine (not due tochance).

    Signicant X A is likely to hold in future (in similardata sets)How to estimate the probability??How small the probability should be?

    Fisherian vs. Neyman-Pearsonian schoolsmultiple testing problem

    SSPD tutorial KDD14 p. 24

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    25/116

    3.1 Main approaches

    SIGNIFICANCETESTING

    EMPIRICALANALYTIC

    FREQUENTIST BAYESIAN

    different schools

    different sampling modelsSSPD tutorial KDD14 p. 25

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    26/116

    Analytic approaches

    H 0: X and A independent (null hypothesis) H 1: X and A positively dependent (research hypothesis)

    Frequentist: Calculate p = P(observed or stronger dependency | H 0)Bayesian:(i) Set P( H 0) and P( H 1)(ii) Calculate P(observed or stronger dependency | H 0) andP (observed or stronger dependency

    | H

    1)

    (iii) Derive (with Bayes rule)P ( H 0|observed or stronger dependency) andP ( H 1

    |observed or stronger dependency)

    SSPD tutorial KDD14 p. 26

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    27/116

    Analytic approaches: pros and cons

    + p-values relatively fast to calculate+ can be used as search criteria How to dene the distribution under H 0? (assumptions) If data not representative, the discoveries cannot be

    generalized to the whole populationdescribe only the sample data or other similarsamplesrandom samples not always possible (innitepopulation)

    SSPD tutorial KDD14 p. 27

    Note: Differences between Fisherian vs.

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    28/116

    Neyman-Pearsonian schools

    signicance testing vs. hypothesis testingrole of nominal p-values (thresholds 0 .05 , 0 .01)many textbooks represent a hybrid approach

    see Hubbard & Bayarri

    SSPD tutorial KDD14 p. 28

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    29/116

    Empirical approach (randomization testing)

    Generate random data sets according to H 0 and testhow many of them contain the observed or strongerdependency X A.(i) Fix a permutation scheme (how to express H 0 + which

    properties of the original data should hold)(ii) Generate a random subset {d 1 , . . . , d b} of all possiblepermutations(iii)

    p = |{d i|contains observed or stronger dependency }|b

    SSPD tutorial KDD14 p. 29

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    30/116

    Empirical approach: pros and cons

    + no assumptions on any underlying parametricdistribution

    + can test null hypotheses for which no closed form testexists+ offers an approach to multiple testing problem

    Later

    + data doesnt have to be a random sample discoveries hold for the whole population ...

    ... dened by the permutation scheme often not clear (but critical), how to permutate data! computationally heavy (b: efciency vs. quality

    trade-off) How to apply during search??SSPD tutorial KDD14 p. 30

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    31/116

    Note: Randomization test vs. Fishers exact test

    When testing signicance of X A

    a natural permutation scheme xes N =

    n, N X =

    f r ( X ), N A = f r ( A)randomization test generates some randomcontingency tables with these constraintsfull permutation test = Fishers exact test studies allcontingency tables

    faster to compute (analytically)produces more reliable results

    No need for randomization tests, here!

    SSPD tutorial KDD14 p. 31

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    32/116

    Part I Contents

    1. Statistical dependency rules2. Variable- and value-based interpretations3. Statistical signicance testing

    3.1 Approaches3.2 Sampling models

    variable-basedvalue-based

    3.3 Multiple testing problem4. Redundancy and signicance of improvement5. Search strategies

    SSPD tutorial KDD14 p. 32

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    33/116

    3.2 Sampling models

    = dening the distribution under H 0

    What do we assume xed?

    Variable-based dependencies: classical samplingmodels (Statistics)

    Value-based dependencies: several suggestions (Datamining)

    SSPD tutorial KDD14 p. 33

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    34/116

    Basic idea

    Given a sampling model MT =set of all possible contingency tables.

    1. Dene probability P(T i|M) for contingency tables T i T2. Dene an extremeness relation T i T j

    T i contains at least as strong dependency X Aas T j doesdepends on the strength measure, e.g.

    (var-based) or (val-based)3. Calculate p = T i T 0 P(T i|M)(T 0=our table)

    SSPD tutorial KDD14 p. 34

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    35/116

    Sampling models for variable-based dependencies

    3 basic models:

    1. Multinomial ( N = n xed)2. Double binomial ( N = n, N X = f r ( X ) xed)3. Hypergeometric ( Fishers exact test)

    ( N = n, N A = f r ( A), N X = f r ( X ) xed)

    + asymptotic measures (like 2)

    SSPD tutorial KDD14 p. 35

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    36/116

    Multinomial model

    Independence assumption: In the innite urn, p XA = p X p A.( p XA=probability of red sweet apples)

    n apples

    a sample of

    INFINITE URN

    SSPD tutorial KDD14 p. 36

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    37/116

    Multinomial model

    T i is dened by random variables N XA, N X A, N XA, N X A

    P ( N XA, N X A, N XA, N X A|n, p X , p A) = n

    N XA, N X A, N XA, N X A p N X X (1

    p X )n N X p N A A (1

    p A)n N A.

    p =T i T 0

    P ( N XA, N X A, N XA, N X A|n, p X , p A) p X and p A can be estimated from the data

    SSPD tutorial KDD14 p. 37

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    38/116

    Double binomial model

    Independence assumption: p A| X = p A = p A| X TWO INFINITE URNS:

    fr( X) green apples

    a sample ofa sample of

    fr(X) red apples

    SSPD tutorial KDD14 p. 38

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    39/116

    Double binomial model

    Probability of red sweet apples:

    P ( N XA| f r ( X ), p A) = f r ( X ) N XA p N XA A (1 p A) f r ( X ) N XA

    Probability of green sweet apples:

    P ( N

    XA

    | f r (

    X ), p A) =

    f r ( X ) N XA

    p N XA A (1

    p A) f r ( X ) N XA

    SSPD tutorial KDD14 p. 39

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    40/116

    Double binomial model

    T i is dened by variables N XA and N XA.

    P ( N XA, N XA|n, f r ( X ), f r ( X ), p A) = f r ( X )

    N XA f r ( X ) N XA

    p N A A (1

    p A)n N A

    p =T i T 0

    P ( N XA, N XA|n, f r ( X ), f r ( X ), p A)

    SSPD tutorial KDD14 p. 40

    H i d l (Fi h )

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    41/116

    Hypergeometric model (Fishers exact test)

    fr(A)n )(

    How many other similar urns have

    at least as strong dependency as ours?

    ALL

    SIMILAR URNS

    OUR URN n apples

    fr(A) sweet + fr( A) bitter

    fr(X) red + fr( X) green

    SSPD tutorial KDD14 p. 41

    Lik i f ll i

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    42/116

    Like in a full permutation test

    A A A A

    A A A

    A A A

    A A A

    A A A A

    A A A A

    A A A A A A A

    1 2 3 4 5 6 987 10

    X X

    A A

    A A A

    A A A

    A

    A A A

    u r n

    1

    u r n

    2

    u r n

    1 2 0

    fr(A)=3

    fr(X)=6

    n=10

    SSPD tutorial KDD14 p. 42

    H t i d l (Fi h t t t)

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    43/116

    Hypergeometric model (Fishers exact test)

    The number of all possible similar urns (xed N = n, N X = f r ( X ) and N A = f r ( A)) is

    f r ( A)

    i= 0

    f r ( X )i f r ( X ) f r ( A) i =

    n f r ( A)

    Now (T i T 0) ( N XA f r ( XA)). Easy!

    pF = i= 0

    f r ( X ) f r ( XA)+ i

    f r (

    X )

    f r ( X A)+ i n

    f r ( A)

    SSPD tutorial KDD14 p. 43

    E l C i f l

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    44/116

    Example: Comparison of p-values

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    15 17 19 21 23 25 27 29

    p

    fr(XA)

    fr(X)=50, fr(A)=30, n=100

    Fisherdouble binmultinom

    SSPD tutorial KDD14 p. 44

    Example: Comparison of values

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    45/116

    Example: Comparison of p-values

    0

    2e-05

    4e-05

    6e-05

    8e-05

    0.0001

    200 250

    p

    fr(XA)

    fr(X)=300, fr(A)=500, n=1000

    Fisherdouble binmultinom

    SSPD tutorial KDD14 p. 45

    Example: Comparison of p values

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    46/116

    Example: Comparison of p-values

    f r XA multi- double Fishernomial binomial (hyperg.)

    180 1.7e-05 1.8e-05 2.2e-05200 2.3e-12 2.2e-12 3.0e-12220 1.4e-22 7.3e-23 1.1e-22240 2.9e-36 3.0e-37 4.4e-37260 1.5e-53 4.2e-56 3.5e-56280 1.3e-74 2.9e-80 1.6e-81300 9.3e-100 3.5e-111 2.5e-119

    SSPD tutorial KDD14 p. 46

    Asymptotic measures

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    47/116

    Asymptotic measures

    Idea: p-values are estimated indirectly

    1. Select some nicely behaving measure M e.g. M follows asymptotically the normal or the 2distribution

    2. Estimate P( M val), where M =

    val in our dataEasy! (look at statistical tables)But the accuracy can be poor

    SSPD tutorial KDD14 p. 47

    The 2 measure

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    48/116

    The -measure

    2 =1

    i= 0

    1

    j= 0

    n(P ( X = i, A = j) P ( X = i)P ( A = j))2P ( X = i)P ( A = j)

    =n(P ( X , A) P ( X )P ( A))2P ( X )P ( X )P ( A)P ( A)

    =n2

    P ( X )P ( X )P ( A)P ( A)very sensitive to underlying assumptions!all P( X = i)P ( A = j) should be sufciently largethe corresponding hypergeometric distributionshouldnt be too skewed

    SSPD tutorial KDD14 p. 48

    Mutual information

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    49/116

    Mutual information

    MI =

    log P( XA)P ( XA)P ( X A)P ( X A)P ( XA)P ( XA)P ( X A)P ( X A)

    P ( X )P ( X )P ( X )P ( X )P ( A)P ( A)P ( A)P ( A)

    2n MI =log likelihood ratiofollows asymptotically the 2-distributionusually gives more reliable results than the 2-measure

    SSPD tutorial KDD14 p. 49

    Comparison: Sampling models for variable-based dependencies

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    50/116

    dependencies

    Multinomial: impractical but useful for theoreticalresults

    Double binomial: not exchangeable p( X A) p( A X ) (in general)Hypergeometric (Fishers exact test): recommended,

    enables efcient search, reliable resultsAsymptotic: often sensitive to underlying assumptions 2 very sensitive, not recommended MI reliable, enables efcient search, approximates pF

    SSPD tutorial KDD14 p. 50

    Sampling models for value-based dependencies

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    51/116

    Sampling models for value based dependencies

    Main choices:

    1. Classical sampling models but with a differentextremeness relation

    use lift to dene a stronger dependencyMultinomial and Double binomial: can differ muchfrom var-basedHypergeometric: leads to Fishers exact test, again!

    2. Binomial models + corresponding asymptoticmeasures

    SSPD tutorial KDD14 p. 51

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    52/116

    Binomial model 1 (classical binomial test)

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    53/116

    Binomial model 1 (classical binomial test)

    Probability of getting exactly N XA sweet red apples andn N XA green or bitter apples is

    p( N XA|n, p XA) = n

    N XA( p XA) N XA(1 p XA)n N XA

    p( N XA f r ( XA)|n, p XA) =n

    i= f r ( XA)

    ni

    ( p XA)i(1 p XA)ni

    (or i = f r ( XA), . . . , min{ f r ( X ), f r ( A)})Use estimate p XA = P( X )P ( A)Note: N X and N A unxed

    SSPD tutorial KDD14 p. 53

    Corresponding asymptotic measure

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    54/116

    p g y p

    z-score:

    z1( X A) = f r ( X , A) = f r ( X , A) nP ( X )P ( A) nP ( X )P ( A)(1 P ( X )P ( A))=

    n( X , A) P ( X )P ( A)(1 P ( X )P ( A))

    = nP ( XA)( ( X , A)

    1)

    ( X , A) P ( X , A) .follows asymptotically the normal distribution

    SSPD tutorial KDD14 p. 54

    Binomial model 2 (suggested in DM)

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    55/116

    ( gg )

    Like the double binomial model, but forget the other urn!

    fr( X) green apples

    a sample ofa sample of

    fr(X) red apples

    CONSIDER ONE FROM TWO INFINITE URNS:

    SSPD tutorial KDD14 p. 55

    Binomial model 2

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    56/116

    Binomial model 2

    p( N XA

    f r ( XA)

    | f r ( X ), P ( A)) =

    f r ( X )

    i= f r ( XA)

    f r ( X )

    iP ( A)iP (

    A) f r ( X )i

    Corresponding z-score:

    z2 = f r ( XA)

    =

    f r ( XA) f r ( X )P ( A)

    f r ( X )P ( A)P (

    A)

    = n( X , A)

    P ( X )P ( A)P ( A)= f r ( X )(P ( A| X ) P ( A)) P ( A)P ( A)

    SSPD tutorial KDD14 p. 56

    J-measure

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    57/116

    J measure

    one urn version of MI

    J = P( XA)log P( XA)P ( X )P ( A)

    + P ( X A)log P( X A)P ( X )P ( A)

    SSPD tutorial KDD14 p. 57

    Example: Comparison of p-values

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    58/116

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    19 21 23 25

    p

    fr(XA)

    fr(X)=25, fr(A)=75, n=100

    bin1bin2

    Fisherdouble binmultinom

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    19 21 23 25

    p

    fr(XA)

    fr(X)=75, fr(A)=25, n=100

    bin1bin2

    Fisherdouble binmultinom

    SSPD tutorial KDD14 p. 58

    Comparison: Sampling models for value-based dependencies

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    59/116

    Multinomial, Hypergeometric, classical Binomial + its z-score: p( X A) = P( A X )Double binomial, alternative Binomial + its z-score: p( X A) P( A X ) (in general)The alternative Binomial, its z-score and J can

    disagree with the other measures (only the X -urn vs.whole data) z-score easy to integrate into search, but may be

    unreliable for infrequent patterns (classical)Binomial test in post-pruning improves quality!

    SSPD tutorial KDD14 p. 59

    Part I Contents

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    60/116

    1. Statistical dependency rules2. Variable- and value-based interpretations

    3. Statistical signicance testing3.1 Approaches3.2 Sampling models3.3 Multiple testing problem

    4. Redundancy and signicance of improvement

    5. Search strategies

    SSPD tutorial KDD14 p. 60

    3.3 Multiple testing problem

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    61/116

    The more patterns we test, the more spurious patternswe are likely to accept.

    If threshold = 0 .05 , there is 5% probability that aspurious dependency passes the test.

    If we test 10 000 rules, we are likely to accept 500spurious rules!

    SSPD tutorial KDD14 p. 61

    Solutions to Multiple testing problem

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    62/116

    1. Direct adjustment approach : adjust (stricterthresholds)

    easiest to integrate into the search2. Holdout approach : Save part of the data for testing Webb3. Randomization test approaches : Estimate the overallsignicance of all discoveries or adjust the individual

    p-values empirically

    e.g. Gionis et al., Hanhijrvi et al.

    SSPD tutorial KDD14 p. 62

    Contingency table for m signicance tests

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    63/116

    spurious rule genuine rule All H 0 true H 1 true

    declared V S Rsignicant false positives true positivesdeclared U T m

    R

    insignicant true negatives false negativesAll m0 m m0 m

    SSPD tutorial KDD14 p. 63

    Direct adjustment: Two approaches

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    64/116

    (i) Control familywise errorrate = probablity of accept-ing at least one false dis-covery

    FWER = P(V 1)

    (ii) Control false discoveryrate = expected proportionof false discoveries

    FDR = E V R

    spurious rule genuine rule Alldecl. sign. V S R

    decl. insign U T m RAll m0 m m0 m

    SSPD tutorial KDD14 p. 64

    (i) Control familywise error rate FWER

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    65/116

    Decide = FWER and calculate a new stricter threhold .

    If tests are mutually independent: = 1

    (1

    )m

    idk correction: = 1 (1 )

    1m

    If they are not independent: m

    Bonferroni correction : = m

    conservative (may lose genuine discoveries)How to estimate m?

    may be explicit and implicit testing during searchHolm-Bonferroni method more powerful

    but less suitable for the search (all p-values shouldbe known, rst)SSPD tutorial KDD14 p. 65

    (ii) Control false discovery rate FDR

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    66/116

    BenjaminiHochbergYekutieli procedure

    1. Decide q = FDR2. Order patterns r i by their p-values

    Result r 1 , . . . , r m such that p1 . . . pm3. Search the largest k such that pk

    k

    q

    mc(m)if tests mutually independent or positivelydependent, c(m) = 1

    otherwise c(m) = mi= 1 1i ln( m) + 0.584. Save patterns r 1 , . . . , r k (as signicant) and rejectr k + 1, . . . , r m

    SSPD tutorial KDD14 p. 66

    Hold-out approach

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    67/116

    Powerful because m is quite small!

    Data

    Explor-atory

    Holdout

    PatternDiscovery

    Patterns

    Statistical

    EvaluationSound

    PatternsM. T.

    correctionAny

    hypothesis

    test

    Limitedtype-2

    error

    SSPD tutorial KDD14 p. 67

    Randomization test approaches

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    68/116

    1. Estimate the overall signicance of discoveries at oncee.g. What is the probability to nd K 0 dependency

    rules whose strength is at least min M ?Empirical p-value

    pemp = |{d i

    | K i

    K 0

    }|+ 1

    b + 1

    d 0 original setd 1 , . . . , d b random setsK 1 , . . . , K b numbers of discovered patterns from set d i

    Gionis et al. SSPD tutorial KDD14 p. 68

    Randomization test approaches (cont.)

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    69/116

    2. Use randomization tests to correct individual p-valuese.g., How many sets contained better rules than X

    A

    ? p = |{d i|(Si )(min p(Y B |d i) p( X A |d 0)}|

    b + 1 ,

    d 0 original setd 1 , . . . , d b random sets

    Si=set of patterns returned from set d i

    Hanhijrvi

    SSPD tutorial KDD14 p. 69

    Randomization test approaches

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    70/116

    + dependencies between patterns not a problem more powerful control over FWER+ one can impose extra constraints (e.g. that a certainpattern holds with a given frequency and condence) most techniques assume subset pivotality the

    complete hypothesis and all subsets of true nullhypotheses have the same distribution of the measurestatistic

    Remember also points mentioned in the single hypothesistesting

    SSPD tutorial KDD14 p. 70

    Part I Contents

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    71/116

    1. Statistical dependency rules2. Variable- and value-based interpretations

    3. Statistical signicance testing3.1 Approaches3.2 Sampling models3.3 Multiple testing problem

    4. Redundancy and signicance of improvement

    5. Search strategies

    SSPD tutorial KDD14 p. 71

    4. Redundancy and signicance of improvement

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    72/116

    When X A is redundant with respect to Y A(Y X )? Improves it signicantly?Examples of redundant dependency rules:

    smoking, coffee

    atherosclerosis

    coffee has no effect on smoking atherosclerosis high cholesterol, sports atherosclerosis sports makes the dependency only weakermale, male pattern baldness atherosclerosis adding male hardly any signicant improvement

    SSPD tutorial KDD14 p. 72

    Redundancy and signicance of improvement

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    73/116

    Value-based: X A is productive if P( A| X ) > P( A|Y ) forall Y X Variable-based: X A is redundant if there is Y X such that M (Y A) is better than M ( X A) with thegiven goodness measure M

    X

    A is non-redundant if for all Y X M ( X

    A) is

    better than M (Y A)When the improvement is signicant?

    SSPD tutorial KDD14 p. 73

    Value-based: Signicance of productivity

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    74/116

    Hypergeometric model:

    p(YQ A|Y A) =i

    f r (YQ)

    f r (YQA)+

    i f r (Y Q)

    f r (Y QA)i f r (Y ) f r (YA)

    probability of the observed or a stronger conditionaldependency Q A, given Y , in a value-based model.

    also asymptotic measures ( 2, MI )

    SSPD tutorial KDD14 p. 74

    Apple problem: value-based

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    75/116

    p(YQ A|Y A) = 0 .0029 Y =red, Q=large

    Basket 1 Basket 2

    40 green apples40 large red apples

    (all sweet) (all bitter)

    20 small

    red apples

    (15 sweet)

    SSPD tutorial KDD14 p. 75

    Apple problem: variable-based?

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    76/116

    p(Y A|(YQ) A) = 2 .9e 10

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    77/116

    Part I Contents

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    78/116

    1. Statistical dependency rules2. Variable- and value-based interpretations

    3. Statistical signicance testing3.1 Approaches3.2 Sampling models3.3 Multiple testing problem

    4. Redundancy and signicance of improvement

    5. Search strategies

    SSPD tutorial KDD14 p. 78

    5. Search strategies

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    79/116

    1. Search for the strongest rules (with , etc.) that passthe signicance test for productivity

    MagnumOpus (Webb 2005)

    2. Search for the most signicant non-redundant rules(with Fishers p etc.)

    Kingsher (Hmlinen 2012)

    3. Search for frequent sets, construct association rules,prune with statistical measures, and lternon-redundant rules??

    No way!closed sets? redundancy problemtheir minimal generators?

    SSPD tutorial KDD14 p. 79

    Main problem: non-monotonicity of statistical dependence

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    80/116

    AB C can express a signicant dependency even if A and C as well as B and C mutually independentIn the worst case, the only signicant dependencyinvolves all attributes A1 . . . Ak (e.g. A1 . . . Ak 1 Ak )

    1) A greedy heuristic does not work!

    2) Studying only simplest dependency rules does not

    reveal everything!

    ABCA1-R219K alzheimerABCA1-R219K, female alzheimer

    SSPD tutorial KDD14 p. 80

    End of Part I

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    81/116

    Questions?

    SSPD tutorial KDD14 p. 81 . . . .

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    82/116

    Statistically sound pattern discoveryPart II: Itemsets

    Wilhelmiina HmlinenGeoff Webb

    http://www.cs.joensuu.fi/~whamalai/ecmlpkdd13/sspdtutorial.html

    Overview

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    83/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Most association discovery techniques find rules Association is often conceived as a relationship

    between two parts so rules provide an intuitive representation

    However, when many items are all mutually

    interdependent, a plethora of rules results Itemsets can provide a more intuitiverepresentation in many contexts

    However, it is less obvious how to identifypotentially interesting itemsets than potentiallyinteresting rules

    Rules

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    84/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    bruises?=true ring-type=pendant[Coverage=3376; Support=3184; Lift=1.93; p

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    85/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    stalk-surface-above-ring=smooth & ring-type=pendant stalk-surface-below-ring=smooth[Coverage=3664; Support=3328; Lift=1.49; p=3.05E-072]bruises?=true stalk-surface-above-ring=smooth[Coverage=3376; Support=3232; Lift=1.50; p

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    86/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    bruises?=true,stalk-surface-above-ring=smooth,stalk-surface-below-ring=smooth,ring-type=pendant[Coverage=2776; Leverage=0.1143; p

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    87/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    An association between two items will berepresented by two rules

    three items nine rules four items twenty-eight rules ...

    It may not be apparent that all the resulting rulesrepresent a single multi-item association

    Reference: Webb 2011

    But how to find itemsets?

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    88/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Association is conceived as deviation fromindependence between two parts

    Itemsets may have many parts

    Main approaches

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    89/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Consider all partitions Randomisation testing

    Incremental mining Information theoretic

    Main approaches

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    90/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Consider all partitions Randomisation testing

    Incremental mining models

    Information theoretic mainly models not statistical

    All partitions

    M l b d f i l

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    91/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Most rule-based measures of interest relateto difference between the joint frequency andexpected frequency under independencebetween antecedent and consequent

    However itemsets do not have an antecedentand consequent

    Does not work to consider deviation fromexpectation if all items are independent ofeach other If P( x , y ) P(x )P(y ) and P(x , y , z ) =

    P(x , y )P(z ) then P(x , y , z ) P(x )P(y )P(z )

    AttendingKDD14, InNewYork, AgeIsEvenReferences: Webb, 2010; Webb, & Vreeken, 2014

    Productive itemsets

    A i i lik l b i i if i

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    92/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    An itemset is unlikely to be interesting if itsfrequency can be predicted by assuming

    independence between any partition thereof Pregnant , Oedema , AgeIsEven Pregnant , Oedema

    AgeIsEven Male , PoorEyesight , ProstateCancer , Glasses

    Male , ProstateCancer PoorEyesight, Glasses

    References: Webb, 2010; Webb, & Vreeken, 2014

    Measuring degree of positive association

    M d f i i i i

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    93/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Measure degree of positive association asdeviation from the maximum of the expectedfrequency under an assumption of

    independence between any partition of theitemseteg

    References: Webb, 2010; Webb, & Vreeken, 2014

    Statistical test for productivity

    N ll h th i

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    94/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Null hypothesis: , Use a Fisher exact test on every partition Equivalent to testing that every rule is

    significant

    No correction for multiple testingnull hypothesis only rejected ifcorresponding null hypothesis isrejected for every partition

    this increases the risk of Type 2 ratherthan Type 1 error

    References: Webb, 2010; Webb, & Vreeken, 2014

    Redundancy

    If it X i f

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    95/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    If item X is a necessary consequence ofanother set of items Y then { X } Y should beassociated with everything with which Y is

    associated. Eg pregnant female and pregnant oedema , female , pregnant , oedema is not likely to be

    interesting if pregnant , oedema is known Discard itemsets I where X I , Y X , P(Y ) =

    P(X ) I = { female , pregnant, oedema } X = { female , pregnant } Y = {pregnant }

    Note, no statistical testReferences: Webb, 2010; Webb, & Vreeken, 2014

    Independent productivity

    Suppose that heat oxygen and fuel are all

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    96/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Suppose that heat , oxygen and fuel are allrequired for fire .

    heat , oxygen and fuel are associated with fire So too are : heat , oxygen

    heat , fuel oxygen, fuel heat

    oxygen fuel

    But these six are potentially misleading giventhe full association References: Webb, 2010; Webb, & Vreeken, 2014

    Independent productivity

    An itemset is unlikely to be interesting if its

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    97/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    An itemset is unlikely to be interesting if itsfrequency can be predicted from the

    frequency of its specialisations If both X and X Y are non-redundantand productive then X is only likely to beinteresting if it holds with respect to Y .

    ,

    P is the set of all non-redundant andproductive patterns

    References: Webb, 2010; Webb, & Vreeken, 2014

    Assessing independent productivity

    Given fuel oxygen

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    98/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Given fuel, oxygen,heat, fire

    to assess fuel,oxygen, fire check whether

    association holds indata without heat

    References: Webb, 2010

    Assessing independent productivity

    Given fuel oxygen

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    99/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Given fuel, oxygen,heat, fire

    to assess fuel,oxygen, fire check whether

    association holds indata without heat

    References: Webb, 2010

    Assessing independent productivity

    Given fuel oxygen

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    100/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Given fuel, oxygen,heat, fire

    to assess fuel,oxygen check whether

    association holds indata without heat or

    fire

    References: Webb, 2010

    Assessing independent productivity

    Given fuel oxygen

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    101/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Given fuel, oxygen,heat, fire

    to assess fuel,oxygen check whether

    association holds indata without heat or

    fire

    References: Webb, 2010

    Mushroom

    118 items 8124 examples

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    102/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    118 items, 8124 examples 9676 non-redundant productive itemsets ( =0.05)

    3164 are not independently productiveedible=e, odor=n[support=3408, leverage=0.194559, p

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    103/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    We typically want to discover associationsthat generalise beyond the given data

    The massive search involved in associationdiscovery results in a massive risk of falsediscoveries associations that appear to hold in the

    sample but do not hold in the generatingprocess

    Reference: Webb 2007

    Bonferroni correction for multiple testing

    Divide critical value by size of search spaceEg retail

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    104/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    v de c t ca va ue by s e o sea c space Eg retail 16470 items Critical value = 0.05/2 16470 < 5E-4000

    Use layered critical values Sum of all critical values cannot exceed familywise criticalvalue

    Allocate different familywise critical values to different

    itemset sizes | | divided by all itemsets of size | I | The critical value for itemset I is thus

    | | | |

    Eg retail 16470 items Critical value for itemsets of size 2 = . = 1.84E-10

    Reference: Webb 2007, 2008

    Randomization testing

    Randomization testing can be used to find

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    105/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Randomization testing can be used to findsignificant itemsets.

    All the advantages and disadvantagesenumerated for dependency rules. Not possible to efficiently test for

    productivity or independent productivityusing randomisation testing.

    References: Megiddo & Srikant, 1998; Gionis et al 2007

    Incremental and interactive mining

    Iteratively find the most informative itemset

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    106/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    Iteratively find the most informative itemsetrelative to those found so far

    May have human-in-the-loop Aim to model the full joint distribution will tend to develop more succinct

    collections of itemsets than self-sufficient itemsets

    will necessarily choose between one ofmany potential such collections.

    References: Hanhijarvi et al 2009; Lijffijt et al 2014

    Belgian lottery

    {43, 44, 45}

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    107/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    {43, 44, 45} 902 frequent itemsets (min sup = 1%)

    All are closed and all are non-derivable KRIMP selects 232 itemsets. MTV selects no itemsets.

    DOCWORD.NIPS Top-25 leverageitemsets

    kaufmann,morgani d i i

    top,bottom

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    108/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    trained,trainingreport,technical

    san,mateomit,cambridgedescent,gradient

    mateo,morganimage,imagessan,mateo,morgan

    mit,pressgrant,supportedmorgan,advances

    springer,verlag

    san,morgankaufmann,mateo

    san,kaufmann,mateodistribution,probabilityconference,international

    conference,proceedinghidden,trainedkaufmann,mateo,morgan

    learn,learnedsan,kaufmann,mateo,morganhidden,training

    Reference: Webb, & Vreeken, 2014

    WORDOC.NIPS Top-25 leverage rules

    kaufmann morganmorgan kaufmann abstract,neural,morgan kaufmannresult,kaufmann morgan

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    109/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    morgan kaufmannabstract,morgan kaufmannabstract,kaufmann morgan

    references,morgan kaufmannreferences,kaufmann morganabstract,references,morgan

    kaufmann

    abstract,references,kaufmannmorgansystem,morgan kaufmannsystem,kaufmann morgan

    neural,kaufmann morganneural,morgan kaufmannabstract,system,kaufmann morganabstract,system,morgan kaufmann

    abstract,neural,kaufmann morgan

    result,kaufmann morganresult,morgan kaufmannreferences,system,morgan kaufmann

    neural,references,kaufmann morganneural,references,morgan kaufmannabstract,references,system,morgan

    kaufmann

    abstract,references,system,kaufmannmorganabstract,result,kaufmann morganabstract,neural,references,kaufmann

    morgan

    Reference: Webb, & Vreeken, 2014

    WORDOC.NIPS Top-25 frequent (closed)itemsets

    abstract,referencesb t t lt

    references,systemf t

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    110/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    abstract,resultreferences,result

    abstract,functionabstract,references,resultabstract,neural

    abstract,systemfunction,referencesabstract,set

    abstract,function,referencesneural,referencesfunction,result

    abstract,neural,references

    references,setneural,result

    abstract,function,resultabstract,introductionabstract,references,system

    abstract,references,setresult,systemresult,set

    abstract,neural,resultabstract,networkabstract,number

    Reference: Webb, & Vreeken, 2014

    WORDOC.NIPS Top-25 lift self-sufficientitemsets

    duane,leapfrogamericana periplaneta

    ekman,hagerlpnn petek

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    111/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    americana,periplanetaalessandro,sperduti

    crippa,ghisellichorale,harmonizationiiiiiiii,iiiiiiiiiii

    artery,coronarykerszberg,linsternuno,vasconcelos

    brasher,krugmizumori,postsubiculumimplantable,pickard

    zag,zig

    lpnn,petekpetek,schmidbauer

    chorale,harmonetdeerwester,dumaisharmonet,harmonization

    fodor,pylyshynjeremy,bonetornstein,uhlenbeck

    nakashima,satoshitaube,postsubiculumiceg,implantable

    Closure of duane,leapfrog(all words in all 4 documents)

    abstract, according, algorithm, approach, approximation,bayesian, carlo, case, cases, computation, computer, defined,

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    112/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    y , , , , p , p , ,department, discarded, distribution, duane, dynamic,dynamical, energy, equation, error, estimate, exp, form,found, framework, function, gaussian, general, gradient,hamiltonian, hidden, hybrid, input, integral, iteration, keeping,kinetic, large, leapfrog, learning, letter, level, linear, log, low,

    mackay, marginal, mean, method, metropolis, model,momentum, monte, neal, network, neural, noise, non, number,obtained, output, parameter, performance, phase, physic,

    point, posterior, prediction, prior, probability, problem,references, rejection, required, result, run, sample, sampling,science, set, simulating, simulation, small, space, squared,step, system, task, term, test, training, uniformly, unit,university, values, vol, weight, zero

    Itemsets Summary

    More attention has been paid to finding associations efficientlythan to which ones to find

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    113/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    While we cannot be certain what will be interesting, the followingprobably won't frequency explained by independence between a partition frequency explained by specialisations

    Statistical testing is essential Itemsets often provide a much more succinct summary of

    association than rules rules provide more fine grained detail rules useful if there is a specific item of interest

    Self-Sufficient Itemsets

    capture all of these principles support comprehensible explanations for why itemsets arerejected

    can be discovered efficiently often find small sets of patterns

    (mushroom: 9,676, retail: 13,663) Reference: Novak et al 2009

    Software

    OPUS Miner can be downloaded from :h // h d / bb/

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    114/116

    intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion

    http://www.csse.monash.edu.au/~webb/Software/opus_miner.tgz

    Statistically sound pattern discoveryCurrent state and future challenges

    Efficient and reliable algorithms for binary andcategorical data

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    115/116

    g branch-and-bound style no minimum frequencies (or harmless like

    5/n) Numeric variables

    impact rules allow numerical consequences(Webb) main challenge : Numerical variables in thecondition part of rule and in itemset How to integrate an optimal discretization

    into search? How to detect all redundant patterns? Long patterns

    End!

    Questions?

  • 8/10/2019 Kdd2014 Hamalainen Webb Discovery

    116/116

    All material:http://cs.joensuu.fi/pages/whamalai/kdd14/ssp

    dtutorial.html