treatment learning: implementation and application ying hu electrical & computer engineering...

18
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia

Upload: brice-horton

Post on 01-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Treatment Learning:Implementation and Application

Ying Hu

Electrical & Computer Engineering

University of British Columbia

Ying Hu http://www.ece.ubc.ca/~yingh 2

Outline

1. An example2. Background Review3. TAR2 Treatment Learner

• TARZAN: Tim Menzies• TAR2: Ying Hu & Tim Menzies

4. TAR3: improved tar2• TAR3: Ying Hu

5. Evaluation of treatment learning6. Application of Treatment Learning7. Conclusion

Ying Hu http://www.ece.ubc.ca/~yingh 3

First Impression

low high

6.7 <= rooms < 9.8 and12.6 <= parent teacher ratio < 15.9

0.6 <= nitric oxide < 1.9 and17.16 <= living standard < 39

• C4.5’s decision tree:• Treatment learner:

Boston Housing Dataset (506 examples, 4 classes)

Ying Hu http://www.ece.ubc.ca/~yingh 4

Review: Background

What is KDD ? – KDD = Knowledge Discovery in Database [fayyad96]

– Data mining: one step in KDD process– Machine learning: learning algorithms

Common data mining tasks– Classification

• Decision tree induction (C4.5) [quinlan86]• Nearest neighbors [cover67]• Neural networks [rosenblatt62]• Naive Baye’s classifier [duda73]

– Association rule mining• APRIORI algorithm [agrawal93]• Variants of APRIORI

Ying Hu http://www.ece.ubc.ca/~yingh 5

Treatment Learning: Definition– Input: classified dataset

• Assume: classes are ordered

– Output: Rx=conjunction of attribute-value pairs• Size of Rx = # of pairs in the Rx

– confidence(Rx w.r.t Class) = P(Class|Rx)– Goal: to find Rx that have different level of

confidence across classes– Evaluate Rx: lift– Visualization form of output

Ying Hu http://www.ece.ubc.ca/~yingh 6

Motivation: Narrow Funnel Effect When is enough learning enough?

– Attributes: < 50%, accuracy: decrease 3-5% [shavlik91]

– 1-level decision tree is comparable to C4 [Holte93]

– Data engineering: ignoring 81% features result in 2% increase of accuracy [kohavi97]

– Scheduling: random sampling outperforms complete search (depth-first) [crawford94]

Narrow funnel effect– Control variables vs. derived variables

– Treatment learning: finding funnel variables

Ying Hu http://www.ece.ubc.ca/~yingh 7

TAR2: The Algorithm Search + attribute utility estimation

– Estimation heuristic: Confidence1

– Search: depth-first search• Search space: confidence1 > threshold

Discretization: equal width interval binning Reporting Rx

– Lift(Rx) > threshold Software package and online distribution

Ying Hu http://www.ece.ubc.ca/~yingh 8

The Pilot Case Study Requirement optimization

– Goal: optimal set of mitigations in a cost effective manner

Risks

Mitigations

RequirementsCost

reduce

relates

Benefit

incur

achieve

Iterative learning cycle

Ying Hu http://www.ece.ubc.ca/~yingh 9

The Pilot Study (continue) Cost-benefit distribution (30/99 mitigations)

Compared to Simulated Annealing

Ying Hu http://www.ece.ubc.ca/~yingh 10

Problem of TAR2 Runtime vs. Rx size

To generate Rx of size r: To generate Rx from size [1..N]

Ying Hu http://www.ece.ubc.ca/~yingh 11

TAR3: the improvement

Random sampling– Key idea:

• Confidence1 distribution = probability distribution

• sample Rx from confidence1 distribution

– Steps:• Place item (ai) in increasing order according to

confidence1 value

• Compute CDF of each ai

• Sample a uniform value u in [0..1]

• The sample is the least ai whose CDF>u

– Repeat till we get a Rx of given size

Ying Hu http://www.ece.ubc.ca/~yingh 12

Comparison of Efficiency Runtime vs. Data size

Runt i me vs. at t r i bute#

R2 = 0. 9436

0

5

10

15

20

25

30

10 20 30 40 50 60 70 80 90 99

at t r i bute#

Runt

ime

(sec

)

Runt i me vs. Rx si ze

R2 = 0. 8836

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8

Treatment si ze

Runt

ime

(sec

)

Runtime vs. Rx size

Runtime vs. TAR2

Ying Hu http://www.ece.ubc.ca/~yingh 13

Comparison of Results

Mean and STD in each round

Final Rx: TAR2=19, TAR3=20

10 UCI domains, identical best Rx

pilot2 dataset (58 * 30k )

Ying Hu http://www.ece.ubc.ca/~yingh 14

External Evaluation

All attributes(10 UCI datasets)

learning

FSS framework

someattributes

learning

CompareAccuracy

C4.5Naive Bayes

Feature subset selectorTAR2less

Ying Hu http://www.ece.ubc.ca/~yingh 15

The Results

Accuracy using Naïve Bayes

(Avg increase = 0.8% )

Number of attributes

Accuracy using C4.5(avg decrease 0.9%)

Ying Hu http://www.ece.ubc.ca/~yingh 16

Compare to other FSS methods

# of attribute selected (C4.5 )

# of attribute selected (Naive Bayes)

17/20, fewest attributes selected Another evidence for funnels

Ying Hu http://www.ece.ubc.ca/~yingh 17

Applications of Treatment Learning Downloading site: http://www.ece.ubc.ca/~yingh/ Collaborators: JPL, WV, Portland, Miami Application examples

– pair programming vs. conventional programming– identify software matrix that are superior error

indicators– identify attributes that make FSMs easy to test– find the best software inspection policy for a

particular software development organization Other applications:

– 1 journal, 4 conference, 6 workshop papers

Ying Hu http://www.ece.ubc.ca/~yingh 18

Main Contributions

New learning approach A novel mining algorithm Algorithm optimization Complete package and online distribution Narrow funnel effect Treatment learner as FSS Application on various research domains