treatment learning: implementation and application

28
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia

Upload: omar

Post on 09-Jan-2016

30 views

Category:

Documents


3 download

DESCRIPTION

Treatment Learning: Implementation and Application. Ying Hu Electrical & Computer Engineering University of British Columbia. Outline. An example Background Review TAR2 Treatment Learner TARZAN: Tim Menzies TAR2: Ying Hu & Tim Menzies TAR3: improved tar2 TAR3: Ying Hu - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Treatment Learning: Implementation and Application

Treatment Learning:Implementation and Application

Ying Hu

Electrical & Computer Engineering

University of British Columbia

Page 2: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 2

Outline

1. An example2. Background Review3. TAR2 Treatment Learner

• TARZAN: Tim Menzies• TAR2: Ying Hu & Tim Menzies

4. TAR3: improved tar2• TAR3: Ying Hu

5. Evaluation of treatment learning6. Application of Treatment Learning7. Conclusion

Page 3: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 3

First Impression

low high

6.7 <= rooms < 9.8 and12.6 <= parent teacher ratio < 15.9

0.6 <= nitric oxide < 1.9 and17.16 <= living standard < 39

• C4.5’s decision tree:• Treatment learner:

Boston Housing Dataset (506 examples, 4 classes)

Page 4: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 4

Review: Background

What is KDD ? – KDD = Knowledge Discovery in Database [fayyad96]

– Data mining: one step in KDD process– Machine learning: learning algorithms

Common data mining tasks– Classification

• Decision tree induction (C4.5) [quinlan86]• Nearest neighbors [cover67]• Neural networks [rosenblatt62]• Naive Baye’s classifier [duda73]

– Association rule mining• APRIORI algorithm [agrawal93]• Variants of APRIORI

Page 5: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 5

Treatment Learning: Definition– Input: classified dataset

• Assume: classes are ordered

– Output: Rx=conjunction of attribute-value pairs• Size of Rx = # of pairs in the Rx

– confidence(Rx w.r.t Class) = P(Class|Rx)– Goal: to find Rx that have different level of

confidence across classes– Evaluate Rx: lift– Visualization form of output

Page 6: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 6

Motivation: Narrow Funnel Effect When is enough learning enough?

– Attributes: < 50%, accuracy: decrease 3-5% [shavlik91]

– 1-level decision tree is comparable to C4 [Holte93]

– Data engineering: ignoring 81% features result in 2% increase of accuracy [kohavi97]

– Scheduling: random sampling outperforms complete search (depth-first) [crawford94]

Narrow funnel effect– Control variables vs. derived variables

– Treatment learning: finding funnel variables

Page 7: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 7

TAR2: The Algorithm Search + attribute utility estimation

– Estimation heuristic: Confidence1

– Search: depth-first search• Search space: confidence1 > threshold

Discretization: equal width interval binning Reporting Rx

– Lift(Rx) > threshold Software package and online distribution

Page 8: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 8

The Pilot Case Study Requirement optimization

– Goal: optimal set of mitigations in a cost effective manner

Risks

Mitigations

RequirementsCost

reduce

relates

Benefit

incur

achieve

Iterative learning cycle

Page 9: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 9

The Pilot Study (continue) Cost-benefit distribution (30/99 mitigations)

Compared to Simulated Annealing

Page 10: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 10

Problem of TAR2 Runtime vs. Rx size

To generate Rx of size r: To generate Rx from size [1..N]

Page 11: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 11

TAR3: the improvement

Random sampling– Key idea:

• Confidence1 distribution = probability distribution

• sample Rx from confidence1 distribution

– Steps:• Place item (ai) in increasing order according to

confidence1 value

• Compute CDF of each ai

• Sample a uniform value u in [0..1]

• The sample is the least ai whose CDF>u

– Repeat till we get a Rx of given size

Page 12: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 12

Comparison of Efficiency Runtime vs. Data size

Runt i me vs. at t r i bute#

R2 = 0. 9436

0

5

10

15

20

25

30

10 20 30 40 50 60 70 80 90 99

at t r i bute#

Runt

ime

(sec

)

Runt i me vs. Rx si ze

R2 = 0. 8836

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8

Treatment si ze

Runt

ime

(sec

)

Runtime vs. Rx size

Runtime vs. TAR2

Page 13: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 13

Comparison of Results

Mean and STD in each round

Final Rx: TAR2=19, TAR3=20

10 UCI domains, identical best Rx

pilot2 dataset (58 * 30k )

Page 14: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 14

External Evaluation

All attributes(10 UCI datasets)

learning

FSS framework

someattributes

learning

CompareAccuracy

C4.5Naive Bayes

Feature subset selectorTAR2less

Page 15: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 15

The Results

Accuracy using Naïve Bayes

(Avg increase = 0.8% )

Number of attributes

Accuracy using C4.5(avg decrease 0.9%)

Page 16: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 16

Compare to other FSS methods

# of attribute selected (C4.5 )

# of attribute selected (Naive Bayes)

17/20, fewest attributes selected Another evidence for funnels

Page 17: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 17

Applications of Treatment Learning Downloading site: http://www.ece.ubc.ca/~yingh/ Collaborators: JPL, WV, Portland, Miami Application examples

– pair programming vs. conventional programming– identify software matrix that are superior error

indicators– identify attributes that make FSMs easy to test– find the best software inspection policy for a

particular software development organization Other applications:

– 1 journal, 4 conference, 6 workshop papers

Page 18: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 18

Main Contributions

New learning approach A novel mining algorithm Algorithm optimization Complete package and online distribution Narrow funnel effect Treatment learner as FSS Application on various research domains

Page 19: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 19

======================

Some notes follow

Page 20: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 20

Rx Definition example Input example

– classified dataset

– Output example:

Rx=conjunction of attribute-value pairs confidence(Rx w.r.t C) = P(C|Rx)

Page 21: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 21

TAR2 in practice Domains containing narrow funnels

– A tail in the confidence1 distribution– A small number of variables that have disproportionally

large confidence1 value– Satisfactory Rx of small size (<6)

Page 22: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 22

Background: Classification

2-step procedure– The learning phase– The testing phase

Strategies employed– Eager learning

• Decision tree induction (e.g. C4.5)• Neural Networks (e.g. Backpropagation)

– Lazy learning• Nearest neighbor classifiers (e.g. K-nearest

neighbor classifier)

Page 23: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 23

Background: Association Rule

Possible Rule:B => C,E[support=2%, confidence= 80%]

Wheresupport(X->Y) = P(X)confidence(X->Y) = P(Y|X)

Representative algorithms– APRIORI

• Apriori property of large itemset

– Max-Miner• More concise

representation of the discovered rules

• Different prune strategies.

ID Transactions

1 A, B, C,E,F

2 B,C,E

3 B,C,D,E

4 … …

Page 24: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 24

Background: Extension

CBA classifier– CBA = Classification Based on Association– X=>Y, Y = class label– More accurate than C4.5 (16/26)

JEP classifier– JEP = Jumping Emerging Patterns

• Support(X w.r.t D1) = 0, Support(X w.r.t D2) > 0• Model: collection of JEPs• Classify: maximum collective impact

– More accurate than both C4.5 & CBA (15/25)

Page 25: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 25

Background: Standard FSS Method

Information Gain attribute ranking Relief Principle Component Analysis (PCA) Correlation based feature selection Consistency based subset evaluation Wrapper subset evaluation

Page 26: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 26

Comparison

Relation to classification– Class boundary / class density– Class weighting

Relation to association rule mining– Multiple classes / no class– Confidence-based pruning

Relation to change detecting algorithm– support: |P(X|y=c1)-P(X|y=c2)|– confidence: |P(y=c1|X)-P(y=c2|X)|– Baye’s rule

Page 27: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 27

Confidence Property

Universal-extential upward closureR1: Age.young -> Salary.low

R2: Age.young, Gender.m -> Salary.low

R2: Age.young, Gender.f -> Salary.low Long rule tend to have high confidence Large Rx tend to have high lift value

Page 28: Treatment Learning: Implementation and Application

Ying Hu http://www.ece.ubc.ca/~yingh 28

TAR3: Usability

Usability: more user-friendly– Intuitive, default setting